Using Articulatory Position Data to Improve Voice Transformation

نویسندگان

  • Arthur Richard Toth
  • Richard Stern
  • Mosur Ravishankar
  • Simon King
چکیده

Voice transformation (also known as voice conversion or voice morphing) is a name given to techniques which take speech from one speaker as input and attempt to produce speech that sounds like it came from another speaker. One compelling argument for good voice transformation is that it reduces the difficulty in creating additional synthetic voices with new identities and styles once an existing voice has been created based on a full-sized corpus. There are further voice transformation applications for security, privacy, and assistive technologies. Although current voice transformation techniques perform well in the sense that humans typically judge transformed speech to sound more like the target speaker than the source speaker, there is still room for improvement. We investigate the use of articulatory position data to improve voice transformation. When a person speaks, motions of the articulators affect the shape of the vocal tract, which affects the produced sound. Recently, data that includes measurements of the positions of various articulators along with recordings of the produced speech has been made publicly available. This articulatory position data gives us new information about the production of speech and has already been used successfully to predict quantities such as Mel-frequency cepstral coefficients [Toda et al., 2004a]. Such data gives us a different source of information from typical features derived from speech signals and enables promising new approaches to voice transformation. One of the current challenges of using articulatory position data is that it is difficult to collect, so little is available. In order for it to be useful for more than a few speakers, some strategy must be devised to estimate it for other speakers. We present a number of techniques to do this and demonstrate that they are plausible by showing that artificial estimates of articulatory positions can be used to improve phonetic feature predictions similar to actual articulatory positions. Then we proceed to the question of using articulatory position features for voice transformation. Modifying the voice transformation process and representation of the articulatory data enables us to show improvement according to an objective metric. Then we demonstrate that artificial articulatory position estimates can also be used to improve voice transformation for speakers for whom no articulatory position data has been collected, according to this same objective metric. As we are attempting to improve voice transformation, we give further consideration to what this actually means. Although a number of objective and subjective tests have been used to judge voice transformation quality, the best way to evaluate it is still an open question. We present new subjective and objective measures for voice transformation and report the results and our observations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using articulatory position data in voice transformation

Articulatory position data is information about the location of various articulators in the vocal tract. One form of it has been made freely available in the MOCHA database [1]. This data is interesting in that it provides direct information on the production of speech, but there is the question of whether it actually provides information beyond what can be derived from the audio signal, which ...

متن کامل

Direct Speech Generation for a Silent Speech Interface based on Permanent Magnet Articulography

Patients with larynx cancer often lose their voice following total laryngectomy. Current methods for post-laryngectomy voice restoration are all unsatisfactory due to different reasons: requires frequent replacement due to biofilm growth (tracheo-oesoephageal valve), speech sounds gruff and masculine (oesophageal speech) or robotic (electro-larynx) and, in general, are difficult to master (oeso...

متن کامل

Voice mimic system using an articulatory codebook for estimation of vocal tract shape

Voice mimic systems using articulatory codebooks require an initial estimate of the vocal tract shape in the vicinity of the global optimum. For this purpose, we need to gather a large set of corresponding articulatory and acoustic data in the articulatory codebook. Thus, searching and accessing the codebook becomes a di cult task. In this paper, the design of an articulatory codebook is presen...

متن کامل

Statistical acoustic-to-articulatory mapping unified with speaker normalization based on voice conversion

This paper proposes a model of speaker-normalized acoustic-toarticulatory mapping using statistical voice conversion. A mapping function from acoustic parameters to articulatory parameters is usually developed with a single speaker’s parallel data. Hence the constructed mapping model can work appropriately only for this specific speaker, and applying this model to other speakers degrades the pe...

متن کامل

Cross-speaker articulatory position data for phonetic feature prediction

Through the use of a device called an Electromagnetic Articulograph, it is possible to measure the locations of a person’s articulators during speech. As more of this data becomes available, one important question is how it can be used. In this paper, we demonstrate that it can improve performance for the recognition of some phonetic features. As articulatory position data is scarce, we also de...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008